[XeTeX] Re: New feature request for XeTeX
Jonathan Kew
jonathan_kew at sil.org
Tue Jul 27 10:11:29 CEST 2004
On 27 Jul 2004, at 3:21 am, Ross Moore wrote:
>
> On 27/07/2004, at 4:31 AM, Somadevah at aol.com wrote:
>
>>
>> In a message dated 26/7/04 4:45:59 pm, ross at ics.mq.edu.au writes:
>>
>> > If you need something more than a simple character code remapping
>> for
>> > certain characters, perhaps those instances could be handled as
>> > \active characters in TeX, while the "mapping=...." option would
>> allow
>> > you to remap the majority of simple characters without having to
>> > \activate just about everything. Reasonable compromise?
>>
>> I suspect this might prove far more valuable than it first looks.
>> Would it not allow one to remap from one script to another? So that
>> one might type Hindi in unicode Roman transliteration and then be
>> able to output it either as Roman transliterated text or Devanagari,
>> depending on the "mapping" used for an environment?
>
> Yes, indeed.
> This is precisely one of the possible uses of a "font mapping".
>
> Another, more mundane, is a simple solution for the "--" and "---"
> *lack of ligature* problem:
>
> "---" could be mapped to ^^^^2014 (em dash)
> "--" could be mapped to ^^^^2013 (en dash)
>
> and similarly for other TeX-specific ligature sequences.
>
This all depends what level of mapping is provided. My initial
impression from your suggestion, Ross, was that we'd have a simple
remapping of individual character codes, somewhat analogous to the .map
files used by ttf2pt1, for example. This would allow each individual
legacy codepoint to be mapped to a desired Unicode character in the
font; but it wouldn't provide a solution for ligatures. And it might
well be inadequate for many kinds of transliteration, where the
relationship between scripts is not always a simple one-to-one
correspondence.
I'm also toying with ideas of more powerful mappings; it happens that I
have a character-mapping engine, TECkit (see
http://scripts.sil.org/teckit) that we could perhaps press into
service. TECkit supports many-to-many mappings, contextually-determined
mappings, even reordering of code sequences. It was developed primarily
to support complex mappings between legacy byte encodings and Unicode,
but can also operate as a transducer entirely within Unicode, which is
what would be needed here as XeTeX will already have interpreted the
input text as Unicode when it was initially read from the file.
Anyway, it's an interesting concept and may actually be implementable,
too.... stay tuned.... but don't hold your breath. :-)
Jonathan
More information about the XeTeX
mailing list