[XeTeX] Re: New feature request for XeTeX

Ross Moore ross at maths.mq.edu.au
Tue Jul 27 13:10:09 CEST 2004

Hi Jonathan,

On 27/07/2004, at 6:11 PM, Jonathan Kew wrote:

> On 27 Jul 2004, at 3:21 am, Ross Moore wrote:

>> Another, more mundane, is a simple solution for the "--" and "---"
>>  *lack of ligature* problem:
>>    "---" could be mapped to ^^^^2014 (em dash)
>>    "--"  could be mapped to ^^^^2013 (en dash)
>> and similarly for other TeX-specific ligature sequences.
> This all depends what level of mapping is provided. My initial 
> impression from your suggestion, Ross, was that we'd have a simple 
> remapping of individual character codes, somewhat analogous to the 
> .map files used by ttf2pt1, for example. This would allow each 
> individual legacy codepoint to be mapped to a desired Unicode 
> character in the font; but it wouldn't provide a solution for 
> ligatures. And it might well be inadequate for many kinds of 
> transliteration, where the relationship between scripts is not always 
> a simple one-to-one correspondence.

A 1--1 mapping would indeed be easiest to implement, and would be 
for  T3 encoding, as in  tipa.sty ,  which was my initial motivating 

A  1--many  mapping would then be a trivial extension; e.g. for accents,
or struck-through characters constructed using the Unicode 'combining'
combinations: e.g.,  (hypothetical examples only)
   B --> B^^^^0338   (B with slash)
   b --> b^^^^0337   (b with slash)
where there is no single code-point to do the job.

A  many--1  or  many--many  map is certainly a bit harder.
It requires:
   (i) the rules to be ordered  (e.g.  ---  must be tested and applied 
before  -- )
  (ii) proper integration with the hyphenation routine.

For (ii) a definite rule is that there cannot be hyphenation between the
code-points returned by a  *--many  mapping replacement.

Apart from (i) and (ii), I don't see much difficulty in this,
and ligatures could then be handled very easily.
Of course I'm not looking from the same view-point as you; so defer
to your experience in programming this kind of thing for a TeX engine.

> I'm also toying with ideas of more powerful mappings; it happens that 
> I have a character-mapping engine, TECkit (see 
> http://scripts.sil.org/teckit) that we could perhaps press into 
> service. TECkit supports many-to-many mappings, 
> contextually-determined mappings, even reordering of code sequences.

Wow; that's more than I was requesting --- at this stage!

> It was developed primarily to support complex mappings between legacy 
> byte encodings and Unicode, but can also operate as a transducer 
> entirely within Unicode, which is what would be needed here as XeTeX 
> will already have interpreted the input text as Unicode when it was 
> initially read from the file.

Looking at your TECkit overview, you seem to have already addressed the 
kind of problem
that I'm trying to solve. (No, I didn't already know of this work!)

> Anyway, it's an interesting concept and may actually be implementable, 
> too.... stay tuned.... but don't hold your breath. :-)

It's more than just interesting.
I think that it is indispensable, for the preservation of the ability 
to interpret
old documents, and continued use of well-established encoding formats.

   (i) New software needs to have the ability to interpret old data 

   (ii) Old, easy-to-use encoding formats will retain a life, so long as 
        computers that they were designed-for are still in service and/or
        people who know how to use them are still active.
        That may be for decades hence. Compatibility with more modern 
        will be needed by archivers & researchers for even longer than 

I find your response most promising indeed.

In the TeX world, holding your breath is never a good policy.  :-)

All the best,


> Jonathan
> _______________________________________________
> XeTeX mailing list
> postmaster at tug.org
> http://tug.org/mailman/listinfo/xetex
Ross Moore                                         ross at maths.mq.edu.au
Mathematics Department                             office: E7A-419
Macquarie University                               tel: +61 +2 9850 8955
Sydney, Australia                                  fax: +61 +2 9850 8114

More information about the XeTeX mailing list