[XeTeX] Re: New feature request for XeTeX

Mon Jul 26 13:43:57 CEST 2004

Hi Ross,

Interesting idea; I'll think about it a bit. Some initial comments 
below....

> What I have in mind could be called a "font mapping",
> or "encoding mapping".
> This would be applied on output, whenever a particular font-variant
> is being used, both when obtaining the size of a sequence of
> letters/glyphs and in the final output.

(Just an aside: seems to me that this is closer to the input side of 
things than the output.) In effect, you're asking for the ability to 
apply a custom mapping from 7-bit codes to Unicode values, specified as 
part of the \font declaration. Is that roughly the idea?

> What I want to do is to be able to continue to use the old TeX sources,
> but to end up with the Unicode code-points in the output

In principle, of course, you can do this by \catcoding the custom input 
characters as active chars, and then \defining them as macros that 
generate the appropriate Unicode codepoints. But that gets really 
cumbersome if you need lots of them, and also need to use the same 
(input) characters within control sequences, etc.

> Having a font-mapping available, there is an obvious way forward,
> that would alleviate completely the need for pre-processors, and
> be easier for a user to configure for his/her own needs.
>
> The \textipa command would cause the mapping to be applied to the
> result (*after* macro-expansions) of the usual processing of its
> argument.
> Currently \textipa sets the font encoding to T3 for its processing,
> and switches font to have the correct metrics available.

Not having touched LaTeX in many years (I used 2.09 once upon a time), 
I don't know how input encodings, font encodings, etc. are handled by 
current LaTeX packages. What does "sets the font encoding to T3" 
actually do, in TeX terms?

>
> With XeTeX, the change would be to an AAT font; but there needs
> to be an appropriate mapping either:
>
>   (i) of 7-bit characters directly to Unicode points;
> or
>   (ii) of 7-bit characters to (La)TeX macros, and further processing
>       to get the Unicode points.
>
> The latter would be more flexible, I think -- though perhaps harder
> to integrate seamlessly into TeX's workflow.

So for (i) do you envisage an extension to the \font command, perhaps 
something like:

	\font\tipalucida="Lucida Grande:mapping=tipa" at 10pt

which would load an external file such as "tipa.xmp" (the extension 
.map is used for too many things already!), containing a mapping to be 
applied to character codes when using this font.

Or should the mapping be defined entirely at the TeX source level?

Your (ii) seems to me to be a major extension, and probably hard to 
design and do well. After all, we don't have a sequence of character 
codes in a specific font until *after* macro processing; at that point, 
you envisage replacing characters with macros and re-processing? Could 
this process recurse indefinitely? It sounds like this could become an 
extension on a similar scale to Omega's OTPs and all that stuff, and I 
don't want to get into designing something of that scope.

If you need something more than a simple character code remapping for 
certain characters, perhaps those instances could be handled as \active 
characters in TeX, while the "mapping=...." option would allow you to 
remap the majority of simple characters without having to \activate 
just about everything. Reasonable compromise?

> As for mathematics, this would make it *much, much easier* to get
> consistent styles for mathematics in a document using an AAT text-font.
> This is because there are already code-points for slanted/italiced
> math-characters, math symbols, extension-brackets, fraktur, etc.
> Appropriate font-mappings for cmmi, cmsy, cmex  would be easy to write.
> (Even some super/subscripts can be supported without changing 
> font-size!)

I suspect this will prove much harder than you think.

TeX relies heavily on special metrics in the TFM file to control math 
typesetting, and when XeTeX loads and uses an AAT font there is no TFM 
file involved! It measures runs of text by calling ATSUI, but that only 
provides the basic width of a character sequence; it doesn't have 
per-character height, depth, and italic correction. And there's no 
place for XeTeX to get the "extensible recipe" used when constructing 
large delimiters, etc.

So this font-mapping mechanism could give you easier access to simple 
characters in Unicode fonts (while keeping source text in legacy 
"hacked ASCII" encodings), but I'm doubtful that it would enable you to 
replace cmex with a Unicode version. For that, I think we really need a 
Unicode-based extension of the TFM file--which Omega has done, hasn't 
it? But XeTeX doesn't currently read OFMs.

Given the limitations, would this still be a worthwhile extension?

Jonathan